Abstract: Current business practices use separate systems to perform computations on data sets which may be from various sources; however, individuals or small organizations may lack this ability. The reuse of intermediate results across further computations is an important class of emerging applications. This paper aims to tackle this issue regarding sharing data between different organizations/applications and thereby optimizing their computations. In addition to sharing large amounts of data, we can share the intermediary/preliminary results from the data pipeline to various organizations. Any organization when handling data involves ETL steps for collecting data, pre-processing and then perform computations on it. If another organization wishes to work with the same data set, it has to repeat the collection, pre-processing and computation process. The proposed system can help organizations/end-users by providing intermediate computation results after performing ETL steps and basic processing on our end. These computation results are available for use by other organizations/individuals for further application specific processing using REST APIs or some other way.
Keywords: shared data store, shared computation, big data, sentiment analysis, image tagging, healthcare.